Voice Frequency Synthesis using VAWGAN based Amplitude Scaling for Emotion Transformation

Hyochan Lee; Hyunhak Song; Sungyoon Cho; Kiwon Kwon; Sunghyun Park; Taeho Im; Hye-Jeong Kwon; Min-Jeong Kim; Ji-Won Baek; Kyungyong Chung

연구문헌

영문 논문지

홈 > 연구문헌 > 영문 논문지 > TIIS (한국인터넷정보학회)

TIIS (한국인터넷정보학회)

Current Result Document :

한글제목(Korean Title)	Voice Frequency Synthesis using VAWGAN based Amplitude Scaling for Emotion Transformation
영문제목(English Title)	Voice Frequency Synthesis using VAWGAN based Amplitude Scaling for Emotion Transformation
저자(Author)	Hyochan Lee Hyunhak Song Sungyoon Cho Kiwon Kwon Sunghyun Park Taeho Im Hye-Jeong Kwon Min-Jeong Kim Ji-Won Baek Kyungyong Chung
원문수록처(Citation)	VOL 16 NO. 02 PP. 0713 ~ 0725 (2022. 02)
한글내용 (Korean Abstract)
영문내용 (English Abstract)	Mostly, artificial intelligence does not show any definite change in emotions. For this reason, it is hard to demonstrate empathy in communication with humans. If frequency modification is applied to neutral emotions, or if a different emotional frequency is added to them, it is possible to develop artificial intelligence with emotions. This study proposes the emotion conversion using the Generative Adversarial Network (GAN) based voice frequency synthesis. The proposed method extracts a frequency from speech data of twenty-four actors and actresses. In other words, it extracts voice features of their different emotions, preserves linguistic features, and converts emotions only. After that, it generates a frequency in variational auto-encoding Wasserstein generative adversarial network (VAW-GAN) in order to make prosody and preserve linguistic information. That makes it possible to learn speech features in parallel. Finally, it corrects a frequency by employing Amplitude Scaling. With the use of the spectral conversion of logarithmic scale, it is converted into a frequency in consideration of human hearing features. Accordingly, the proposed technique provides the emotion conversion of speeches in order to express emotions in line with artificially generated voices or speeches.
키워드(Keyword)	Aid to Navigation Maritime Computer Vision Object Detection High-Speed Processing Emotion Transformation Generative Adversarial Network Voice Frequency Synthesis Voice Analysis
파일첨부	PDF 다운로드